Add HealthCheckRuntime context manager for shared boilerplate [1/2]#77
Add HealthCheckRuntime context manager for shared boilerplate [1/2]#77gustcol wants to merge 2 commits intofacebookresearch:mainfrom
Conversation
…plate Extract the ~30 lines of repeated setup code (logger init, GPU node ID detection, derived cluster resolution, TelemetryContext + OutputContext nesting, killswitch check) into a reusable HealthCheckRuntime dataclass context manager. This reduces per-subcommand boilerplate from ~30 lines to ~5 lines. The helper is purely additive — existing checks continue to work unchanged. New checks can use `with HealthCheckRuntime(...) as rt:` instead of manually wiring up the setup ceremony. Includes comprehensive tests covering field initialization, killswitch behavior, context manager nesting, GPU node ID failure handling, and the finish() convenience method. Refs: facebookresearch#75
CI CommandsThe following CI workflows run automatically on every push and pull request:
The following commands can be used by maintainers to trigger additional tests that require access to secrets:
|
Apply ufmt formatting and fix mypy errors in test helper by using explicit typed parameters instead of **kwargs dict unpacking.
|
Claude finished @luccabb's task in 2m 23s —— View job Code Review
Overall this is a well-structured abstraction that genuinely reduces boilerplate. The dataclass + context manager pattern is clean, test coverage is solid, and the Bug:
|
| Severity | Finding |
|---|---|
| 🔴 Bug | sys.exit() in __enter__ bypasses TelemetryContext/OutputContext cleanup on killswitch |
| 🟡 Test gap | test_killswitch_enabled_exits_ok doesn't verify context __exit__ is called |
| 🟢 Minor | Direct _stack.__enter__() call needs a comment |
| 🟢 Nit | Missing docstrings on class and finish() |
| 🟢 Nit | Mixed %s vs f-string logging style |
Summary
Ref: #75
HealthCheckRuntime, a@dataclasscontext manager that encapsulates the ~30 lines of repeated setup code every health check subcommand duplicates: logger initialization, GPU node ID detection, derived cluster resolution,ExitStackwithTelemetryContext+OutputContext, and killswitch checkingwith HealthCheckRuntime(...) as rt:block)Stacked PR series: [1/2] Runtime helper → [2/2] Scaffold tool (depends on this PR)
Before (~30 lines per subcommand)
After (~5 lines per subcommand)
Test plan
nox -s tests -- gcm/tests/health_checks_tests/test_runtime.py— 6 tests covering initialization, killswitch behavior, finish(), context nesting, GPU node ID failurenox -s lintnox -s formatnox -s typecheck